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Abstract 

In this paper, we demonstrate the possibility of predicting people's 
hometowns by using their geotagged photos posted on Flickr website. 
We employ Kruskal's algorithm to cluster photos taken by a user and 
predict the user's hometown. Our results prove that using social profiles 
of photographers allows researchers to predict the locations of their taken 
photos with higher accuracies. This in return can improve the previous 
methods which were purely based on visual features of photos pQ. 



1 Problem Definition: estimating place of living 

Suppose we have information about the places where a user (e.g. u) has visited 
during a period of time T. Let us denote every visited location like £j with a 
triple (xi,yi,ti) where Xi and yi show the latitude and longitude of the visited 
place and U shows the visited time. Our problem is to predict the user it's 
hometown given a sequence of visited locations with size of n. For the real 
data, we are going to use the available information from Flickr website 0. In 
Flickr, users have the capability to upload and share their photos with other 
people in the community on the website. Users can also have a profile page 
where they can post their personal information such as their names, hometowns, 
gender, occupation, list of their friends, and so on. One interesting feature of 
Flickr website is that it allows users to add tags to each uploaded photo. More 
interestingly, users can geotag their photos by explicitly posting the location 
that they have taken a specific photo. In this paper, we assume that users 
only upload the photos which have taken by themselves. Thus, the geotag 
information of photos can be used as a proxy for user's mobility. 
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2 Related Work 



Hayes et al. have already studied Flickr photos from an image processing point 
of view. They have proposed an image recognition algorithm which tries to 
predict the location of a photo by looking at the photo's visual feature pQ. 
Kalogerakis et al. have enhanced Hayes work by not only considering visual 
features of photos but also by taking into account the truncated Levy flight 
models of human mobility [5] • Kleinberg [3] and Nowell f4_ have independently 
shown that the probability of friendship for a given pair of users such as (u, v) 
drops as the geographical distance between them (e.g. d(u,v)) increases. Moti- 
vated by their work, Backstrom et al. have proposed an algorithm for predicting 
a person's hometown by only having information about their friends' hometowns 
0. 



3 Motivation 

To best of our knowledge, there is not any work in mobile computing area which 
has used Flickr's data to study human mobility models. In this work, we are 
going to show the possibility of using the Flickr data to study human mobility 
patterns. Estimating a user's place of living by studying their uploaded con- 
tents is an interesting problem which can have applications in social computing 
area. Furthermore, our work proves the effectivity of using people's social in- 
formation for predicting the location of a given photo. We strongly believe that 
this method can enhance the previous algorithms which were based on visual 
features' of a single or a set of photos. 



4 How far do people take their photos from their 
hometowns? 

To collect the Flickr photos and users' social profiles, we have developed several 
python scripts which use Flickr API H to crawl the requested data from Flickr 
website. In this paper, we take a random user (e.g. u) and collect all geotagged 
photos which have been uploaded by u's friends. We also collect the hometown 
information for every u's friend. Our chosen user u has 31 friends who have 
geotagged photos and have also reported their places of living on their profile 
pages. These 31 people have totally uploaded 21219 geotagged photos. First, 
we have computed the distance of each uploaded photo from its photographer's 
place of living. We have shown the probability distribution of photos' locations 
from their photographers' hometown in Figure [1] The distribution follows a 
power-law with exponent of 6 = —2.38 which is similar to proposed human 
mobility model by Gonzalez's which was obtained by analyzing cellphones data 
[6] . This power- law distribution shows that people are more likely to take photos 
from places which are close to their home city as we expect intuitively. Even if 

2 http: / /www. fiickr.com/services/api/ 
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Figure 1: Distance probability distribution 



this observation may seem natural, it gives us an effective way to predict the 
place where people live by analyzing their uploaded contents in virtual world. 

5 Hometown Predictor Algorithm 

Considering our observation from the previous part, we can propose a simple 
but effective method to predict people's hometowns by analyzing their uploaded 
photos. Although people are more likely to take photos from places close to their 
home, they also take photos when they travel to other places far from their 
home cities or countries. Therefore, we need a clustering approach to cluster 
photos which are geographically close to each others. In this paper, we employ 
Kruskal's algorithm to cluster similar photos based on their geotag information. 
Let's assume a user u visits at least k different places (e.g. k different cities) 
during the time interval T. We simply need to find the k different clusters which 
represent the k geographically different locations on the Earth. By recalling the 
power-law distribution for photos' locations, we expect that the cluster with 
the highest density (i.e. maximum number of photos) represents the user's 
hometown with a high probability. Therefore, we estimate the latitude and 
longitude of a user it's place of living by taking a simple average over locations 
of all it's photos' which fall inside the cluster with the highest density. Figures 
[2] and [3] demonstrate the locations of photos which have been taken by four 
different people (shown with red diamonds), their reported hometowns (shown 
with black circles), and the estimated hometowns by our algorithm (shown by 
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(a) osiatynska (b) craignos 

Figure 2: Photos' locations taken by two different users 




(a) crosslens (b) koltregaskcs 

Figure 3: Photos' locations taken by two different users 



green squares). As we can see in these figures, the photos can have a very wide 
geographical distribution. For instance, two users osiatynska and craignos have 
taken their photos from three different continents. 

As mentioned earlier, we have estimated the possible hometowns for 31 dif- 
ferent people. To show the performance of our predictor, we can compute the 
distribution of distance error for predicted locations. Figure [4] shows the prob- 
ability distribution of errors for our predictor algorithm for these 31 people. In 
70% of the cases our algorithm has predicted the place of living of people with 
low error. 



6 Conclusion 

This work shows the performance of using the geotagged contents for predicting 
people's places of living. Although our results might seem natural as we experi- 




Figure 4: Error probability distribution 



ence in our daily life, it highlights the importance of social profiles in predicting 
a photo's location. In other words, it proves the power of using social profiles 
for estimating the location of photos. 
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